Goto

Collaborating Authors

 decision-focused learning



Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning

Neural Information Processing Systems

In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman-based and policy gradient-based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks.


Decision-Focused Learning without Decision-Making: Learning Locally Optimized Decision Losses

Neural Information Processing Systems

Decision-Focused Learning (DFL) is a paradigm for tailoring a predictive model to a downstream optimization task that uses its predictions in order to perform better \textit{on that specific task}. The main technical challenge associated with DFL is that it requires being able to differentiate through the optimization problem, which is difficult due to discontinuous solutions and other challenges. Past work has largely gotten around this this issue by \textit{handcrafting} task-specific surrogates to the original optimization problem that provide informative gradients when differentiated through. However, the need to handcraft surrogates for each new task limits the usability of DFL. In addition, there are often no guarantees about the convexity of the resulting surrogates and, as a result, training a predictive model using them can lead to inferior local optima. In this paper, we do away with surrogates altogether and instead \textit{learn} loss functions that capture task-specific information. To the best of our knowledge, ours is the first approach that entirely replaces the optimization component of decision-focused learning with a loss that is automatically learned. Our approach (a) only requires access to a black-box oracle that can solve the optimization problem and is thus \textit{generalizable}, and (b) can be \textit{convex by construction} and so can be easily optimized over. We evaluate our approach on three resource allocation problems from the literature and find that our approach outperforms learning without taking into account task-structure in all three domains, and even hand-crafted surrogates from the literature.


A Dual Perspective on Decision-Focused Learning: Scalable Training via Dual-Guided Surrogates

Rodriguez-Diaz, Paula, Paulson, Kirk Bansak Elisabeth

arXiv.org Artificial Intelligence

Many real-world decisions are made under uncertainty by solving optimization problems using predicted quantities. This predict-then-optimize paradigm has motivated decision-focused learning, which trains models with awareness of how the optimizer uses predictions, improving the performance of downstream decisions. Despite its promise, scaling is challenging: state-of-the-art methods either differentiate through a solver or rely on task-specific surrogates, both of which require frequent and expensive calls to an optimizer, often a combinatorial one. In this paper, we leverage dual variables from the downstream problem to shape learning and introduce Dual-Guided Loss (DGL), a simple, scalable objective that preserves decision alignment while reducing solver dependence. We construct DGL specifically for combinatorial selection problems with natural one-of-many constraints, such as matching, knapsack, and shortest path. Our approach (a) decouples optimization from gradient updates by solving the downstream problem only periodically; (b) between refreshes, trains on dual-adjusted targets using simple differentiable surrogate losses; and (c) as refreshes become less frequent, drives training cost toward standard supervised learning while retaining strong decision alignment. We prove that DGL has asymptotically diminishing decision regret, analyze runtime complexity, and show on two problem classes that DGL matches or exceeds state-of-the-art DFL methods while using far fewer solver calls and substantially less training time. Code is available at https://github.com/


Appendix A Proofs

Neural Information Processing Systems

The first part of the proof follows the policy gradient theorem. This concludes the proof of Theorem 1.Theorem 2 Since the second-order derivative formulation as stated in Theorem 1 and Theorem 2 are both unbiased derivative estimate. The randomly initiated neural network uses ReLU layers as nonlinearity followed by a linear layer in the end. In order to train the optimal policy, in the gridworld example, we use tabular value-iteration algorithm to learn the Q value of each state action pair. So the number of available actions is 5, while the number of available states is 5 5 = 25 .



Decision-Focused Learning with Directional Gradients

Neural Information Processing Systems

We propose a novel family of decision-aware surrogate losses, called Perturbation Gradient (PG) losses, for the predict-then-optimize framework. These losses directly approximate the downstream decision loss and can be optimized using off-the-shelf gradient-based methods. Importantly, unlike existing surrogate losses, the approximation error of our PG losses vanishes as the number of samples grows. This implies that optimizing our surrogate loss yields a best-in-class policy asymptotically, even in misspecified settings. This is the first such result in misspecified settings and we provide numerical evidence confirming our PG losses substantively outperform existing proposals when the underlying model is misspecified and the noise is not centrally symmetric.


Online Decision-Focused Learning

Capitaine, Aymeric, Haddouche, Maxime, Moulines, Eric, Jordan, Michael I., Boursier, Etienne, Durmus, Alain

arXiv.org Machine Learning

Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks. Instead of merely optimizing for predictive accuracy, DFL trains models to directly minimize the loss associated with downstream decisions. This end-to-end strategy holds promise for tackling complex combinatorial problems; however, existing studies focus solely on scenarios where a fixed batch of data is available and the objective function does not change over time. We instead investigate DFL in dynamic environments where the objective function and data distribution evolve over time. This setting is challenging because the objective function has zero or undefined gradients -- which prevents the use of standard first-order optimization methods -- and is generally non-convex. To address these difficulties, we (i) regularize the objective to make it differentiable and (ii) make use of the optimism principle, based on a near-optimal oracle along with an appropriate perturbation. This leads to a practical online algorithm for which we establish bounds on the expected dynamic regret, both when the decision space is a simplex and when it is a general bounded convex polytope. Finally, we demonstrate the effectiveness of our algorithm by comparing its performance with a classic prediction-focused approach on a simple knapsack experiment.


Decision-Focused Fine-Tuning of Time Series Foundation Models for Dispatchable Feeder Optimization

Beichter, Maximilian, Friederich, Nils, Pinter, Janik, Werling, Dorina, Phipps, Kaleb, Beichter, Sebastian, Neumann, Oliver, Mikut, Ralf, Hagenmeyer, Veit, Heidrich, Benedikt

arXiv.org Machine Learning

Time series foundation models provide a universal solution for generating forecasts to support optimization problems in energy systems. Those foundation models are typically trained in a prediction-focused manner to maximize forecast quality. In contrast, decision-focused learning directly improves the resulting value of the forecast in downstream optimization rather than merely maximizing forecasting quality. The practical integration of forecast values into forecasting models is challenging, particularly when addressing complex applications with diverse instances, such as buildings. This becomes even more complicated when instances possess specific characteristics that require instance-specific, tailored predictions to increase the forecast value. To tackle this challenge, we use decision-focused fine-tuning within time series foundation models to offer a scalable and efficient solution for decision-focused learning applied to the dispatchable feeder optimization problem. To obtain more robust predictions for scarce building data, we use Moirai as a state-of-the-art foundation model, which offers robust and generalized results with few-shot parameter-efficient fine-tuning. Comparing the decision-focused fine-tuned Moirai with a state-of-the-art classical prediction-focused fine-tuning Morai, we observe an improvement of 9.45% in average total daily costs.


Decision-Focused Learning for Complex System Identification: HVAC Management System Application

Favaro, Pietro, Toubeau, Jean-François, Vallée, François, Dvorkin, Yury

arXiv.org Artificial Intelligence

As opposed to conventional training methods tailored to minimize a given statistical metric or task-agnostic loss (e.g., mean squared error), Decision-Focused Learning (DFL) trains machine learning models for optimal performance in downstream decision-making tools. We argue that DFL can be leveraged to learn the parameters of system dynamics, expressed as constraint of the convex optimization control policy, while the system control signal is being optimized, thus creating an end-to-end learning framework. This is particularly relevant for systems in which behavior changes once the control policy is applied, hence rendering historical data less applicable. The proposed approach can perform system identification - i.e., determine appropriate parameters for the system analytical model - and control simultaneously to ensure that the model's accuracy is focused on areas most relevant to control. Furthermore, because black-box systems are non-differentiable, we design a loss function that requires solely to measure the system response. We propose pre-training on historical data and constraint relaxation to stabilize the DFL and deal with potential infeasibilities in learning. We demonstrate the usefulness of the method on a building Heating, Ventilation, and Air Conditioning day-ahead management system for a realistic 15-zone building located in Denver, US. The results show that the conventional RC building model, with the parameters obtained from historical data using supervised learning, underestimates HVAC electrical power consumption. For our case study, the ex-post cost is on average six times higher than the expected one. Meanwhile, the same RC model with parameters obtained via DFL underestimates the ex-post cost only by 3%.